Fable is a big step in the wrong direction

Don't be too upset that Fable is not available outside the US. Fable beats previous models in benchmarks, but represents a big step in the wrong direction.

Anthropic just released Fable, their most powerful model yet. Almost immediately after the release, the US Department of Justice had Anthropic disable it for all customers outside of the US. There is no real evidence to suggest that Fable poses a significantly larger security threat than existing models. This seems to just be a case of Anthropic's fear mongering catching up to them. If you missed the window to try Fable this time around, you will likely get the chance again soon.

Fable is, by any measure, the smartest model Anthropic has released to date. But rather than mark a new milestone in machine intelligence, it feels like an incremental improvement on the models that are already on the market. For most work, the models that are already on the market will produce results that are just as good as Fable. While Fable might lead on "intelligence" it is significantly worse in more important areas.

2x - 40x more expensive

Fable is priced at $10/M input tokens and $50/M output tokens. This is twice the cost of Anthropic’s previous flagship, Opus. Furthermore, Fable will not be available on Claude Pro and Max subscriptions that offer massive discounts on token costs. The cost difference between the latest Opus model and Fable for the same task can be as much as 40x. This is arguably one of the biggest issues with AI coding agents today. These models are undoubtedly useful, but they often do not provide enough value to justify the cost.

Take the recent example of Uber. In the first 3 months of 2026, they spent their entire yearly budget on tokens. When their COO investigated, he was unable to draw a line between this massive additional expense and any noticeable increase in productivity or new features.

Microsoft recently started cancelling their Claude Code subscriptions after finding out that the cost of AI was actually more expensive than using human employees.

If you work for a smaller company you might feel like you are getting a lot of value for money with Claude Pro or Claude Max. For smaller companies under 150 employees, these subscription costs are heavily subsidized. The plans are designed to get more people using Claude Code, and they are costing Anthropic billions of dollars each year.

Microsoft recently changed the pricing for copilot so that customers now pay for usage. As a result one user's monthly bill went from $30 to $3000.

The fact that Fable will not be available on these plans is a good indicator that Anthropic is planning to go the same way. Likely after their IPO later this year.

The most underrated metric

There are more AI benchmarks than anyone can possibly keep up with. Every time a new model is released, it references scores on a benchmark you almost certainly haven’t heard about before. But for all the countless ways to measure a model’s capabilities, one is almost always ignored: speed. How is it that an industry that seems obsessed with making software development faster never talks about the speed of different models?

Fable scores 82% on the Terminal Bench (verified) benchmark. In comparison, Google’s Gemini 3.5 Flash scores 76%, so Fable completes 7.9% more tasks than Gemini 3.5 Flash. However, Flash outputs an average of 133 tokens per second, compared to Fable’s 39 t/s.

This 3.5x difference is huge. Fable might be able to complete 7.9% more tasks on the first try, but it will take 3.5x longer and over 5x the cost to do so. And that’s when solving tasks specifically designed to challenge the models’ capabilities. In a normal coding setting, this difference will be even more pronounced.

The output speed of the frontier models we have today means that developers often spend more time waiting for the model to finish than on prompting and reviewing code. Current attempts to address this issue involve ignoring every single lesson we, as an industry, have learned about software development over the last 30 years.

It is always better to halve the time it takes to do a task than to do twice as many at the same time

People are trying to run multiple agents for every one developer, leading to rapid task switching and causing increased stress and burnout. We are regressing to “spec-driven development”, where the developer spends time writing a carefully detailed spec—almost at the level of code—to ensure that the AI agent is more likely to get it right on the first attempt. But these initiatives are not innovative new ways of working with AI. They are a desperate attempt to solve the real issue: AI agents are not fast enough. We are treating the symptoms instead of the underlying problem.

We already know that this is not how you increase productivity in a software team. It was the core message of the Agile Manifesto and books such as The Mythical Man-Month. If you want to move fast, break big tasks into smaller tasks and iterate quickly.

What can we do now?

For a while now I have been using a very simple workflow when working with AI agents. Start with a fast model (I mostly use Gemini 3.5 Flash or Cursor’s composer models). Break down complex tasks and ask the agent to solve one thing at a time. Give enough context so the model knows what it is working on, but don’t overdo it. Iterate with the model while solving the task. If I come across a task that the fast model cannot solve, I will either do it myself or switch to a frontier model while getting another cup of coffee. In practice, I almost never switch models.

For many people this might seem like a step back from running a swarm of parallel agents, but in reality it is much more productive. The faster model means that the majority of tasks get solved 3–4x faster with very few iterations. My mind is always on the task at hand, so I don’t have to pay the cost of task switching. This means that I don’t get fatigued as quickly and don’t need as many breaks. Since changes happen in smaller increments, it is much easier for me to keep up with what the AI agent is doing, so I don’t have to wait until the end of a task to review the code.

We need to start evaluating AI models on more than just their benchmark scores. Cost and speed are just as important, if not more so. Instead of compromising our work processes to accommodate AI models, we should ask for AI models that support the way we want to work.

Companies have rushed to adopt AI faster than any other technology in history. They have poured billions of dollars into AI, transformed organizations, delayed hiring and even made "tokens spend" a metric for success. When asked how this massive investment has impacted their bottom line, nobody seems to have an answer.

For years now AI companies have been selling us a story about where AI is going. The things that will be possible in 6 months or a year. The productivity that will be unlocked and the people that will be replaced. But AI coding is not a theoretical future. It is here right now. We don’t need another promise or an overhyped model at a heavily inflated price point. We need reliable, faster, and more cost-efficient tools.